Querying multilevel annotation and alignment for detecting grammatical valence divergencies
نویسنده
چکیده
The valence concept has been used in machine translation as well as didactics on order to build up valence dictionaries for the respective uses. Most valence dictionaries have been built up manually, but given the growing number of parallel resources, it would be desirable to automatically exploit them as basis for building up bilingual valence dictionaries. The present contribution reports on a pilot study on a German-English parallel corpus. In this study, patterns of verb plus grammatical functions were extracted from parallel sentences. The paper reports on some of the basic findings of this extraction, regarding divergencies both in valence patterns as well as syntactic realisations of the predicate, i.e. the verb. These findings set the agenda for further research, which should focus on how to detect semantic shifts of valence carriers in translation and how this affects valence.
منابع مشابه
Discontinuous Constituents: a Problematic Case for Parallel Corpora Annotation and Querying
In this paper, we discuss some linguistic phenomena that pose potential problems for multilevel linguistic annotation of parallel corpora in general and specifically for data encoding with state-of-art multilevel corpus querying tools such as CQP. We describe the strategy we use for integrating the standard hierarchical XML representation used to annotate such phenomena in our aligned bilingual...
متن کاملSemi-Automatic Phonological Annotations of Speech by Grammatical Inference
This paper describes a technique for automatically generating multiple levels of linguistic annotation for a corpus of speech utterances. Using a training corpus of multilevel annotations, a corresponding finite-state representation is automatically constructed by grammatical inference. This finite-state description is then employed as a knowledge component to automatically generate a new multi...
متن کاملDetecting Grammatical Errors in Machine Translation Output Using Dependency Parsing and Treebank Querying
Despite the recent advances in the field of machine translation (MT), MT systems cannot guarantee that the sentences they produce will be fluent and coherent in both syntax and semantics. Detecting and highlighting errors in machine-translated sentences can help post-editors to focus on the erroneous fragments that need to be corrected. This paper presents two methods for detecting grammatical ...
متن کاملRobust clause boundary identification for corpus annotation
The paper describes a rule-based system for tagging clause boundaries, implemented for annotating the Estonian Reference Corpus of the University of Tartu, a collection of written texts containing ca 245 million running words and available for querying via Keeleveeb language portal. The system needs information about parts of speech and grammatical categories coded in the word-forms, i.e. it ta...
متن کاملConsistency Checking for Treebank Alignment
This paper explores ways to detect errors in aligned corpora, using very little technology. In the first method, applicable to any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to find inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012